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Abstract — Consider a team of agents in the plane searching for 
and visiting target points that appear in a bounded environment 
according to a stochastic renewal process with a known absolutely 
continuous spatial distribution. Agents must detect targets with 
limited-range onboard sensors. It is desired to minimize the 
expected waiting time between the appearance of a target point, 
and the instant it is visited. When the sensing radius is small, 
the system time is dominated by time spent searching, and it is 
shown that the optimal policy requires the agents to search a 
region at a relative frequency proportional to the square root 
of its renewal rate. On the other hand, when targets appear 
frequently, the system time is dominated by time spent servicing 
known targets, and it is shown that the optimal policy requires 
the agents to service a region at a relative frequency proportional 
to the cube root of its renewal rate. Furthermore, the presented 
algorithms in this case recover the optimal performance achieved 
by agents with full information of the environment. Simulation 
results verify the theoretical performance of the algorithms. 

I. Introduction 

A very active research area today addresses the coordination 
of several mobile agents: groups of autonomous robots and 
large-scale mobile networks are being considered for a broad 
class of applications, ranging from environmental monitoring, 
to search and rescue operations, and national security. Wide- 
area surveillance is one of the prototypical missions for Un- 
inhabited Aerial Vehicles (UAVs): low-altitude UAVs on such 
a mission must provide coverage of a region and investigate 
events of interest as they manifest themselves. In particular, 
we are interested in cases in which close-range information 
is required on targets detected by onboard sensors, and the 
UAVs must proceed to the locations to gather further on-site 
information. 

We address a routing problem for a team of agents in 
the plane: target points appear over time in a bounded en- 
vironment according to a stochastic renewal process with a 
known absolutely continuous spatial distribution. It is desired 
to stabilize the outstanding target queue and minimize the 
expected elapsed time between the appearance of a target 
point, and the instant it is visited (the system time). This is 
a formulation of the Dynamic Traveling Repairman Problem 
(DTPvP), introduced in JT| and thoroughly developed in 0, 
0. Numerous algorithms are presented and analyzed in this 
series of seminal works. Furthermore, the property of spatial 
bias is studied. In particular, they analyze the problem and 
develop policies under the constraint that a target's expected 
waiting time must be independent of its location. In addition to 
combinatorial and convex optimization, many of the solutions 
rely heavily on results from the relatively mature fields of 
facility location, probability and queueing theory. 
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In an effort to address issues relevant in applications such 
as autonomous mobile robotics, this paper focuses on a 
variation of the DTRP. We place limitations on the information 
available to the vehicles and analyze the effect on the system's 
achievable performance. In particular, we consider the case in 
which vehicles are not aware of the location of targets as they 
appear, but rather must detect them using on-board sensors 
with a limited range. 

The recent literature concerning problems of this class 
is vast. Some recent work on the routing of nonholonomic 
vehicles includes PI. 1151. Many of the new results for the small 
sensing radius case are applicable to coverage problems (6), 
0, 0, in which the agents spread out with some sense 
of balance, or comb the environment efficiently. This case 
also has connections to the Persistent Area Denial (PAD) and 
area coverage problems 0, ifTOl . ifTTI . H121 . Other works 
are concerned with the generation of efficient cooperative 
strategies for several mobile agents to move through a certain 
number of given target points, possibly avoiding obstacles or 
threats fl3l . ifFfl . Ifl5l . fl6l . Trajectory efficiency in these 
cases is understood in terms of cost for the agents: in other 
words, efficient trajectories minimize the total path length, 
the time needed to complete the task, or the fuel/energy 
expenditure. A related problem has been investigated as the 
Weapon-Target Assignment (WTA) problem, in which mobile 
agents are allowed to team up in order to enhance the prob- 
ability of a favorable outcome in a target engagement IfTTI . 
fl8l . In other works addressing UAV task-assigment, target 
locations are known and an assignment strategy is sought 
that maximizes the global success rate fl9l . lEUll . Moreover, 
this work holds a place in the recent wave of investigation 
into the cooperative control of multi-agent systems 121], \22\. 
Other works addressing cooperative task completion by UAVs 

include ma, ei, ma. 

Song and coworkers considered the problem of searching for 
a static object emitting intermittent stochastic signals under a 
limited sensing range, and analyze the performance of standard 
algorithms such as systematic sweep and random walks |26|. 
Due to the intermittent signals from the object, robots must 
perform a persistent search, thus making the work similar 
to ours. However, the authors assumed no prior information 
about the location of the target object is available; hence, their 
setting is equivalent to the assumption of a uniform spatial 
distribution. In our work, we explicitly consider non-uniform 
spatial distributions, which lead to different kinds of optimal 
policies. Mathew and Mezic presented an algorithm named 
Spectral Multiscale Coverage (SMC) to devise trajectories 
such that the spatial distribution of a patrol vehicle's position 
asymptotically matches a given function STT\ . Similarly, Can- 
nata and Sgorbissa describe an algorithm that solves what they 
call the Multirobot Controlled Frequency Coverage (MRCFC) 
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problem, in which a team of robots are required to repeatedly 
visit a set of predefined locations according to a specified 
frequency distribution [28 1. We show that when attempting 
to minimize discovery time, the desired spatial distribution of 
the agent's position is dependent on, but not equivalent to the 
underlying spatial distribution of incidents it must find. 

The main contributions of this paper are the following. 1) 
For the case of small sensing radius, we establish a new lower 
bound and 2) a new policy whose performance is tight with the 
lower bound. 3) In combination, these two contributions show 
that the optimal policy requires that agents search subregions 
of the environment at relative frequencies proportional to the 
square root of their renewal rates, i.e., searching is biased 
towards high density regions, but not as biased as the density 
function itself. 4) For the case of high target renewal rate, we 
present a new policy that works for agents with limited sensing 
capabilities. 5) Comparing the performance of our policy 
with a previously known lower bound, we show that-like the 
full-information case-the optimal policy requires that agents 
search subregions of the environment at relative frequencies 
proportional to the cube root of their renewal rates. 6) These 
results imply that the limited sensing capabilities do not 
adversely affect the optimal performance of the agents in 
this case, i.e., we recover the optimal performance of the 
full-information heavy load case. 7) Moreover, we provide 
scalable, decentralized strategies by which a multi-vehicle 
team can operate with the above mentioned algorithms, and 
retain optimal performance. Earlier versions of this work only 
consider a single agent and uniform spatial distribution [29] 
or a single agent and a known absolutely continuous spatial 
distribution, but without analysis of the case of high target 
renewal rate QUI . 

This paper is organized as follows. In Sec.[TTJwe formulate 
the DTRP with limited sensing and review known results. In 
Sections [Til] we offer a lower bound for this new problem. In 
IV we present algorithms for the single agent, and compare 

we adapt our 
VTI concludes 



compact support Q C R 2 , then with probability one 

ETSP(2?„) 



lim 



their performance with lower bounds. In Sec. [V 
algorithms to the multiple vehicle setting. Sec 
the paper and notes possibilities for future research. 



II. Problem Formulations and Previous Results 

In this section, we formally introduce the dynamic vehicle 
routing problem we wish to study, without the additional 
limitations on sensing or motion constaints. We also review 
results of well studied static vehicle routing problems, in which 
the vehicles have full information, and travel cost is simply 
Euclidean distance. The known performance limits for these 
problems serve as reference points for results found on the 
problem variations studied herein. They give insight as to how 
the new constraints affect the efficiency of the system. 

Given a set T> n C M 2 of n points, the two-dimensional 
Euclidean Traveling Salesman Problem (ETSP) is the problem 
of finding the shortest tour (closed path) through all points in 
T> n ; let ETSP(2\) be the length of such a tour. Furthermore, 
we will make use of the following remarkable result. 

Theorem 1 (071/, 02V): If the locations of the points in 
T> n are independently and identically distributed (i.i.d.) with 



= /3 / ^pjqjdq, (1) 
n->oo yn J Q 

where (3 > is a constant not depending on the distribution 
of the points and where tp is the density of the absolutely 
continuous part of the distribution of the points. 
The current best estimate of the constant is (3 ss 0.7120 [33|, 
ll34ll . According to [35 1, if Q is a "fairly compact and fairly 
convex" set in the plane, then ([TJ provides an adequate esti- 
mate of the optimal ETSP tour length for values of n as low as 
15. Interestingly, the asymptotic cost of the ETSP for uniform 
point distributions is an upper bound on the asymptotic cost 
for general point distributions, as can be proved by applying 
Jensen's inequality to ([TJ. In other words, if A is the area of 
set Q, 

lim ETSP 1 P ») = p [ JfijjdqKpVh 

n^oo y/n Jq 

with probability one. 

We will present algorithms that require online solutions 
of large ETSPs. In practice, these solutions are computed 
using heuristics such as Lin-Kernighan's ll36l or approximation 
algorithms such as Christofides' [37 1. If the algorithm used in 
practice guarantees a performance within a constant factor of 
the optimal, the effect on the performances of our algorithms 
can be modeled as a scaling of the constant j3. 

The following is a formulation of the Dynamic Traveling 
Repairman Problem (DTRP) |2l. IT381. 151. Let Q C K 2 be a 
convex, compact domain on the plane, with non-empty inte- 
rior; we will refer to Q as the environment. Let A be the area of 
Q. Target points are i.i.d. and generated according to a spatio- 
temporal Poisson point process, with temporal intensity A > 0, 
and an absolutely continuous spatial distribution described 
by the density function ip : Q — s- M.+. The spatial density 
function ip is K-Lipschitz, \(p(qx) — <^(<?2)| < -^ll<7i — <?2|| 
for all qi and q^ in Q, and bounded above and below, 
< (p < (p(q) < Tp < oo for all q in Q, and is normalized 
in such a way that Jq (p(q) dq = 1. For any t > 0, 
V^t) is a random collection of points in Q, representing the 
targets generated in the time interval [0, i). The following are 
consequences of the properties defining Poisson processes. 

• The total numbers of targets generated in two disjoint 
time-space regions are independent random variables. 

• The total number of targets generated in an interval [t, t+ 
At) in a measurable set S C Q satisfies 

Pr [card ((V(t + At) -V{t))nS) = k] 
exp(-AAt ■ ip(S))(XAt ■ p(S)) k 



k\ 



for all k £ N, and hence 



E[card ((V(t + At) - V{t)) n S)} = XAt ■ Lp{S), 

where <p(S) is a shorthand for J s ip(q) dq. 
A service request is fulfilled when one of m mobile agents, 
modeled as point masses, moves to the target point associated 
with it; m is a possibly large, but finite number. In later sec- 
tions, we will introduce limitations on the agent's awareness of 
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targets, and nonholonomic constraints on the agent's motion. 
Let us assume the agents have holonomic (single integrator) 
dynamics, and that all agent's are aware of a target's location 
upon its arrival epoch. Let p(t) = {pi(t),p2(t), ■■■,p m (t)) € 
Q m be a vector describing the positions of the agents at time 
t. The agents are free to move, with bounded speed, within 
the workspace Q; let v be the maximum speed of the agents. 
In other words, the dynamics of the agents are described by 
differential equations of the form 

d ^f^=u i (t), with |K(i)|| < v, Vt>0,i€l m , 
at 

where we denote /,„ = {1, . . . , m}. The agents are identical, 
and have unlimited fuel and target-servicing capacity. 

Let T> : t — > 2 s indicate (a realization of) the stochastic 
process obtained combining the target generation process 
V and the removal process caused by the agents visiting 
outstanding requests. The random set T>(t) C Q represents 
the demand, i.e., the service requests outstanding at time t; let 
n(t) = card(X>(i)). 

A motion coordination policy is a function that determines 
the actions of each vehicle over time, based on the locally- 
available information. For the time being, we will denote 
these functions as tt = (m, 7T2, . . . , 7r m ), but do not explicitly 
state their domain; the output of these functions is a velocity 
command for each agent. Our objective is the design of motion 
coordination strategies that allow the mobile agents to fulfill 
service requests efficiently (we will make this more precise in 
the following). 

A policy 7r = (tti, 7T2, . . . , 7r m ) is said to be stabilizing if, 
under its effect, the expected number of outstanding targets 
does not diverge over time, i.e., if 

tItt = lim E [n(t) : agents execute policy tt] < oo. 

Intuitively, a policy is stabilizing if the mobile agents are able 
to visit targets at a rate that is — on average — at least as fast 
as the rate at which new service requests are generated. 

Let Tj be the time elapsed between the generation of the 
j-th target, and the time it is fulfilled. If the system is stable, 
then the following balance equation (also known as Little's 
formula (39)) holds: 

= XT^, (2) 

where T„ := limj-yoo E[Tj] is the system time under policy tt, 
i.e., the expected time a target must wait before being fulfilled, 
given that the mobile agents follow the strategy defined by tt. 
Note that the system time T T can be thought of as a measure 
of the quality of service collectively provided by the agents. 

At this point we can finally state our problem: we wish 
to devise a policy that is (i) stabilizing, and (ii) yields a 
quality of service (system time) achieving, or approximating, 
the theoretical optimal performance given by 

T op t = inf T„. 

tt stabilizing 

In the following, we are interested in designing control 
policies that provide constant-factor approximations of the 
optimal achievable performance; a policy tt is said to pro- 



vide a constant-factor approximation of k if T„ < KT opt . 
Furthermore, a policy is called spatially unbiased if, under its 
action, a target's expected waiting time is independent of its 
location and spatially biased otherwise. We shall investigate 
how this spatial constraint effects the achievable system time, 
i.e., we shall find lower bounds and develop algorithms within 
the class of spatially unbiased policies, and without. Moreover, 
we are interested in decentralized, scalable, adaptive control 
policies, that rely only on local exchange of information 
between neighboring vehicles, and do not require explicit 
knowledge of the global structure of the network. 

The DTRP with general demand distribution is studied 
in Q, where the form of the optimal system time in heavy load 
is first derived. However, there remained a constant-factor gap 
between the lower and upper bounds. The coefficient of the 
lower bound was tightened from (2/(3y/rf)) 2 to /3 2 /2 in [40|: 

H^^ff VW)dq)\ (3) 
a-s-oo A 2m z v z \Jq J 

within the class of spatially unbiased policies, and 

1™ T = ^f/ ^) 2/3 dq)\ (4) 
A->oo A 2m z v z \Jq J 

within the class of spatially biased policies. As mentioned 
in 0, 

A > dq^j > (^J cp(q)^ da^ 3 

with equality holding throughout if and only if ip(q) = 1/A 
for all q G Q. In other words, uniform density is the 
worst possible, and any non-uniformity will strictly lower the 
optimal system time. This is analogous with the length of the 
stochastic ETSP, i.e., Eq. ([TJ. Furthermore, the optimal system 
time for spatially biased policies is lower than that of spatially 
unbiased policies. This follows intuition as spatially unbiased 
waiting time is a constraint limiting the realm of available 
policies. 

In addition to the above formulation of the DTRP, we add 
a constraint on the information available to the agents. Agents 
are not aware of a target's existence or location upon its 
arrival epoch. Rather, they must detect targets with limited- 
range onboard sensors, i.e., they must come within the local 
vicinity of the target. Let us call this variation the Limited 
Sensing DTRP. Formally, this means the set T>(t) is in general 
not entirely known to all agents, due to the fact that the sensing 
range is limited. For the sake of simplicity, we will model the 
sensing region of an agent as a disk of radius r centered at 
the agent's position; indicate with 

d(Pi) = {q E K 2 : \\ Pl - q\\ < r} 

the sensing region for the i-th agent. Other shapes of the 
sensor footprint can be considered with minor modifications 
to our analysis, and affect the results at most by a constant. 
We will assume that the sensor footprint is small enough that 
it is contained within Q for at least one position of the agent; 
moreover, we will be interested in analyzing the effect of a 
limited sensor range as r — >• + . 
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III. Lower Bounds on the Optimal System Time 

Every target must be detected by an agent's sensor before 
it is serviced. Let T^ etcct be the time elapsed between the 
generation of the j-th target, and the time it is first detected 
by any agent's sensor. Consider an alternative scenario we call 
the Detection DTRP, in which targets are fulfilled at the instant 



they are first detected, and T v 



-detect 



:= lim,,- 



'dctcctl 



IS 



^detect 



<Ti. 



the detection time under policy tt. For all targets, 7j ^ ± 3 , 
and thus 

inf T v > inf T^ CtCCt . 

7i stabilizing tt stabilizing 

In other words, a lower bound on the achievable detection time 
is also a lower bound on the achievable system time, T opt . We 
leverage this fact in the following theorem. 

Theorem 2: The optimal system time for the DTRP with 
limited sensing satisfies 



lim T opt r > — — , 
within the class of spatially unbiased policies and 

2 



lim T opt r > 



1 



r->0+ 



Q 



V<p(q) d 1 



(5) 



(6) 



within the class of spatially biased policies. 

Proof: The probability that a target's location is within a 
sensor footprint at the time of arrival is bounded by 

Pr[g G U^dipi)} < Jprmrr 2 . 



In this case, T, 



detect 



0. However, for any fixed number of 



vehicles m and distribution tp, 

lim Pr[q £ U™ i d(p<)l < lim Tprrnrr 2 = 0, 

r->0+ r->0+ 



and therefore, 



lim Pr[g i Ur =1 d( P <)} = 1. 

r— »0+ 



In this limit, from the perspective of a point q E Q, the actions 
of a given stabilizing policy tt are described by the following 
(possibly random) sequence of variables: the lengths of the 
time intervals during which the point is not contained in any 
sensor footprint, Yfe(g). In the following, we use the notation 

E [^(g)] = lim E [Yfc(g) : agents execute policy tt] . 

Due to random incidence ll35l . ATI , a target's detection time, 
conditioned upon its location, can be written as 

E \Y 2 (nS\ 

lim E [T dctect |g] = lim Prfe $ U^dfo)] • ^ 

r->0+ r-¥0+ 2E[r 7r (q)J 

= E[^( g )] 2 +Var[F x ( g )] 
2E[y 7r ( g )] 

>^E[y;(«)] 



where E [^(q)] and Var [^(g)] are, respectively, the second 
moment and variance of the sequence Yj, (q) under the actions 
of policy tt. In other words, for fixed E[l^-(g)], the system 
time is minimized if Var [^(5)] = 0. This occurs under the 



actions of a deterministic policy with exactly regular time 
intervals between searching location q. 

Define f(q) as the — time averaged — frequency at which 
point q is searched under the actions of a policy. Note that 

/(?) = l/E[r w (g)],andso 



-detect 



<p(q)E[T 3 



detect 



q] dq 



> 



y(g) 
/(?) 



dq. 



The m-vehicle system is capable of searching at a maximum 
rate of 2mvr (area per unit time), and so the average searching 
frequency is bounded by J Q f(q)/A dq < 2mvr/A. Recalling 

detect 

T op t >T we have the following minimization problem: 

- f <p(q) 

2T opt > min / -—- dq subject to 

f Jq f{q) 

f(q) dq < 2mvr and f(q) > 0. 

Q 

Since the objective function is convex in f(q) and the con- 
straints are linear, the above is an infinite-dimensional convex 
program. Relaxing the constraint with a multiplier, we arrive 
at the Lagrange dual: 

f vio) 

Iq /(<?) 

>(<?) 



2T op t > min 
/(?)>o 



dq 



f(q) dq — 2mvr 



mm 

q /(?)>0 



/(?) 



•r/(«) 



Q 
dq 



2mvrY. 



Differentiating the integrand with respect to f(q) and setting 
it equal to zero, we find the pair 



and 



/*(?) 



/ 1 



<p(q) 



(7) 



satisfy the Kuhn-Tucker necessary conditions for optimal- 
ity ll42l . and since it is a convex program, these conditions 
are sufficient to insure global optimality. Upon substitution, 
(|6| is proved. On the other hand, the constraint of unbiased 
service requires that 

E [2f"*% = q] = ^ = T d ° tcct , V, e Q. 



Substituting into J Q f(q) dq < 2mvr, we get 



A 



2T 



detect 



< 2mvr. 



— — detect m 

Rearranging and recalling T opt > T , we arrive at (|5J. ■ 
Oftentimes, a tight lower bound offers insight into the 
optimal solution of a problem. Assuming that this lower bound 
is tight, Eq. (|7]i suggests that in the spatially biased small 
sensing-range case, the optimal policy searches a point q at 
regular intervals at a relative frequency proportional to */ </?(<?). 
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Moreover, we have presented new lower bounds on the optimal 
system time related to the searching capability of the agents 
and the necessary struture of any stabilizing policy. In the 
following sections, we will use these bounds to evaluate the 
performance of our proposed policies for the case of small 
sensor range. 

IV. Algorithms and Policies for a Single Agent 

In this section, we present four policies for the single- 
vehicle Limited Sensor DTRP, and prove their respective op- 
timality in different limiting cases and classes of the problem, 
namely, the case of small sensor range (spatially unbiased and 
biased), and the case of heavy load (spatially unbiased and 
biased). To begin, we present two algorithms (subroutines used 
by the policies) by which an agent can service targets in a 
given region of the environment. The first is designed for the 
small-sensor case, and the second is designed for the heavy 
load case. We analyze their properties in their respective cases. 

In the following, we consider a convex subregion S C Q 
of area As- The targets in S are generated by a local 
Poisson process with time-intensity As = A<^(6>) and spatial 
distribution ifis(q) = <p(q) / V(£) f° r a H Note that ips 

is normalized such that (ps(S) — 1. The first algorithm is 
designed for the case of small sensing-range. 

SWEEP-SERVICE 

The description of this algorithm requires the use of an 
inertial Cartesian coordinate frame. The algorithm is defined 
as follows. 

• Partition S into elements of width 2r with lines parallel 
to the x-axis. 

• Define a strip as the bounding rectangle of an element of 
the partition, with sides parallel to the coordinate axes, 
and minimum side-lengths. 

• Plan a path running along the longitudinal bisector of 
each strip, visiting all strips from top-to-bottom, connect- 
ing adjacent strip bisectors by their endpoints. 

• Execute this path and visit targets as they are detected 
in the following manner. If a target is detected in the 
current strip and it is in front of the agent (with respect 
to the direction the agent is moving on the path), then 
the agent continues on the path until its position has the 
same ^-coordinate as the target's. It then departs from 
the path, moving directly to the target, returning to the 
point of departure, and continuing on the path. If a target 
is detected outside the current strip, or behind the agent, 
then it is ignored. 

We now analyze the length L swp (S) of the path planned by 
the algorithm. 

Proposition 3: The length L swp (S) of the path planned by 
SWEEP-SERVICE for region S satisfies 

A 5 

lim L swp (5)r < — 

Proof: Consider a grid of squares with sides of length 2r 
parallel to the coordinate axes. Denote N sq (S) as the number 
of squares with nonzero intersection with S. The sum of the 



Fig. 1 . Depiction of an agent executing SWEEP-SERVICE. 

lengths of all strips can be bounded by 2rN sq (S). Bound 
the region S with a rectangle whose sides are parallel with 
the coordinate axes, and denote the length of its perimeter 
P(S). The length of the path between the endpoints of the 
strip bisectors can be bounded above by P(S). Thus, the total 
path length satisfies 

L swp {S) <2rN sq (S)+P(S). 

But the number of squares satisfies |431 

lim N sq (S)r 2 = 

and so 

lim L swp (S)r< Urn. (2N sq (S)r 2 + P(S)r) = ^f. 



SNAPSHOT-TSP 

The second algorithm is designed for the heavy load case. 
This algorithm requires that the subregion S C Q has a size 
and shape such that it can be contained in the sensor footprint 
of the agent, i.e., there exists a position p such that \\p— q\\ < r 
for all q £ S. Let p snap be one such position. The algorithm 
is defined as follows. 

• Move to location p snap and take a snapshot, i.e., store 
in memory, the locations of all targets outstanding at the 
current time, called t snap - 

• Compute a minimum-length tour of all points in the 
snapshot. 

• Generate a uniformly distributed (in terms of path length) 
random starting position on the tour (not necessarily a 
target's location). 

• Randomly select a direction (clockwise or counterclock- 
wise) with equal probability. 

• Move to the starting position and execute the tour in 
the chosen direction, ignoring all targets that appear after 

^snap- 

We now analyze the length of the tour computed by the 
algorithm. Define the set of targets in the snapshot as 2? snap 
and denote the cardinality of this set as n snap . 

Proposition 4: Assuming that all targets generated before 
some past time t c \ cal were cleared from S, and the set of 
targets outstanding at the current time t is the set generated 
by the local Poisson process in the time interval (t c \ cal ,i\ of 
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length At = t — £ c lear> the length of the tour computed by 
SNAPSHOT-TSP satisfies 



lim 

A— f oo 



E [ETSP(£> S 



pVAl [ yftpjtjj dq. 
Js 



VA j s 
Proof: The points in the set 2? S nap are i.i.d. with compact 
support S C M 2 , and n snap oo almost surely as A —> oo. 
By Theorem [TJ 

ETSP(2? snap ) 



lim 



"hub- S-OO 



snap 



/3 / y/<Ps(qj dq. 
Js 



Thus as n snap — > oo, 



E [ETSP(X> ; 



snap ) 



= E 



/V n snap / V<Ps(q) dq 

Js 

= P E [yn snap ] / y/ips(q) dq. 
Js 

Define the length of the time interval over which 2? snap was 
generated as At = t 
Poisson with mean E \n. 



•tciear- The random variable n snap is 
\sAt. By Jensen's inequality, 



snapj 



However, as \$ 
equality, i.e., 



e [yrj snap ] < yE[« snap ]. 

oo, the above inequality approaches 



lim 

A s ->oo 



E [^ 



snap 



At 



and thus 



lim 



E[ETSP(P snap )] 
VXs 



= py/At / vW«) dq. 



Substituting A5 = \>p(S) and <ps(q) — l p(q)/ ( p('S), we arrive 
at the claim. ■ 
The reader might find the choice of taking a snapshot at a 
particular instant and ignoring all targets generated thereafter 
peculiar. One might suggest that the agent could easily service 
newly generated targets whose locations happen to coincide 
with the vicinity of targets already in the current snapshot. 
However, with the described method, the sets of points in the 
snapshot are i.i.d. from the given Poisson process, and this 
allows us to apply Theorem [TJ to the tour computed for each 
snapshot. 

We now present four policies, each of which is designed 
for one of the four cases: small sensing-range (spatially biased 
and unbiased) and heavy load (spatially biased and unbiased). 
Some of the policies we present require a partition of the 
environment Q into tiles (<Si,<S2, ...,Sk), i-e., U^ =1 Sk = Q 
and Sk H Se — if k ^ t. Each policy has different 
required properties of the partition, however, let us define 
some of the properties here. An equitable partition with 
respect to a measure %p ; Q — > R + is a partition such that 
ip(Sk) = ip(Si) for all k,£ e Ik- Note that this condition 
implies ip(Sk) = ^{Q)/K for all k € Ik- A convex partition 
is one whose subsets are convex. 

A. The Unbiased Region Sweep (URS) Policy 

The policy is defined in Algorithm [TJ The index i is a label 
for the current phase of the policy. 



for i <— 1 to 00 do 

j Execute SWEEP-SERVICE on the environment Q 
end 



Algorithm 1: URS Policy 



Theorem 5: Let T opt be the optimal system time for the 
single-agent Limited Sensing DTRP over the class of spatially 
unbiased policies. Then the system time of a single agent 
operating on Q under the URS policy satisfies 



T, 



1 



0^ 



(8) 



opt 



Proof: Define a phase of this policy as the time interval 
over which the agent performs one execution of SWEEP- 
SERVICE on the region Q. Denote the length of the i-th 
phase by yP hasc i anc j the number of targets visited during 
the i-th phase by n*. Assuming that the policy is stabilizing, 
rii is finite. The expected length of the i-th phase j'P hasc 5 
conditioned on n,, satisfies 



E 



7I phase |n, 



< 



£sw P (2) + 2rni + diam(Q) 



Applying Proposition [3] 



lim E 

r->0+ 



T phase 1 
\ri4 



r < 



j. £ S w P (Q)r + 2rtjr 2 + diam(Q)? 

r->0+ V 



A 
2v' 



In other words, in the limit as r — > + , the length of the phase 
does not depend on the number of targets serviced, so long as 
it is finite. In expectation, a target waits one half of a phase 
to be serviced, independent of its location. Therefore 

A 

4t? 



lim T URS r < 



and moreover, the policy is spatially unbiased. Combining the 
above result with the lower bound on the optimal system time 
within the class of spatially unbiased policies in Theorem [2] 
and substituting m = 1, the claim is proved. ■ 
Theorem [5] shows that the optimal spatially unbiased policy 
for small sensing range simply searches the entire environment 
with equal frequency. In this case, the constraint on spatial bias 
does not allow the agent to leverage non-uniformity in the 
distribution of targets, tp, in order to lower the system time. 
We performed numerical experiments of the URS policy and 
results are shown in comparison with the lower bound (Eq. |5]l) 
in Fig. |IV-A"| on a log-log plot. The URS policy provides near- 
optimal performance for very small values of sensing radius 



B. The Biased Tile Sweep (BTS) Policy 

This policy requires that ip be a piecewise uniform density, 
i.e., Qi, Q2, Qj be a partition of Q such that <p{q) = 
Pj Vg e Qj,3 — 1,2, ... ,J. Let us assume that each subset 
Qj is convex, as a non-convex subset can be further partitioned 
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3.0 




— V 0.0 

-3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 

lag sensing radius 

Fig. 2. Simulation performance of the URS policy for a single agent with 
unit velocity in a unit square environment and uniform spatial distribution. 
Results are compared with the theoretical lower bound (Eq. |5j) as radius 
r approaches zero on a log-log plot. The URS policy provides near-optimal 
performance for very small values of r. 



into convex subsets. Let us denote Aj = Arca(Qj), j = 
1,2,..., J. 

This policy requires a tiling of the environment Q with 
the following properties. For some positive integer K £ N, 
partition each subset Qj into Kj — K/^/JIJ convex tiles, each 
of area Aj /Kj — Aj ^/JTj/K. We assume K is chosen large 
enough that an integer Kj can be found such that K/Kj is 
sufficiently close to y/jlj. In other words, it requires a convex 
and equitable partition of each region Qj with respect to a 
constant measure This can be done with the following 
simple method. Partition Qj into strip-like tiles of equal 
measure with Kj parallel lines. Let us give the tiles of Qj 
an ordered labeling Sj^,Sj t 2, ■ ■ ■ ,Sj y K ■ The BTS policy is 
defined in Algorithm [2] The index i is a label for the current 
phase of the policy. 



Initialize kj <— 1 for j = 1, 2, . . . , J 
for i <— 1 to oo do 
for j 1 to J do 

Execute SWEEP-SERVICE on tile S jtkj 

if kj < Kj then kj <- (kj + 1) 

else kj <- 1 

end 

end 

Algorithm 2: BTS Policy 

Example 6: In order to illustrate the application of the 
BTS policy on a specific problem instance, we consider the 
environment and piece-wise constant density function shown 
in Fig. [6] This example is made up of four subregions, Qj, 
each of constant density, fij, indicated in the drawing. We do 
not specify the areas of the subregions as their magnitudes 
are not relevant to the algorithm. For the example to be well 
posed, we can assume that the areas are such that <p(Q) = 1. 
In fact, the absolute values of the fXj's are not relevant either. 
In order to apply the BTS policy, all we need to know are 
the relative magnitudes of the /i/s. In other words, how much 



more likely is it that a target appears in Q\ rather than Q 2 ? 
Towards the end of making the relative magnitudes clear, this 
example's lowest density subregion has a density of 1. Any 
given piecewise-constant density can be normalized to such a 
form. 

The first step of the algorithm is to choose our scaling 
constant, a positive integer K e N. Since the highest density is 
[i\ — 36 and K\= KJ JJii in order to ensure that K\ > 1, we 
must set K > 6. The beneficial aspect of setting K arbitrarily 
large is that it allows us to make K/Kj arbitrarily close to 
y/JLj for each j. But in practice, there is a cost associated with 
increasing K: it increases the frequency of transition moves 
between sweeping tiles. In the limit as r — > + these transition 
costs are negligible as the time required to sweep an individual 
tile dwarfs them, but for small but fixed r, these transition costs 
must be balanced with the benefit of designing the ideal ratio's 
between the Kj's, In this particular example, we don't have 
to make a trade-off between the transition costs and the ideal 
ratio's of the Kj's. We can set K to its minimum feasible 
value, K = 6, and then Kj = 6/y/J!] for each j, resulting in 

Ky = 1, 

K 2 = 2, 
K 3 = 3, 
K A = 6. 

Each subregion Qj is divided into Kj tiles of equal measure, 
Sj,i,Sj t 2, ...,Sj t Kj» as shown in Fig. [6] Then the agent 
performs SWEEP-SERVICE on the tiles in the following 
repeating sequence. During each phase, one tile from each 
subregion is swept: 

phase! : {51,1,52,1,53,1,544}, 
phase 2 : {5 M , 5 2 , 2 , 5 3 , 2 , 5 4 , 2 }, 
phase 3 : {5 M , S 2 ,i, 5 3 , 3 , 5 4 , 3 }, 
phase 4 : {5i,i,5 2 , 2 ,5 3 ,i,5 4 ,4}, 
phase 5 : {5i,i, 5 2 ,i, 5 3 , 2 , 5 4 , 5 }, 
phase 6 : {5 M , 5 2 , 2 , 5 3 , 3 , 5 4 , 6 }. 

Again, the relative magnitudes of the areas of the subregions 
are irrelevant to the algorithm. The subregions in this example 
are of equal area only for clarity and simplicity. Of course, 
the absolute magnitudes of the areas relative to the agent's 
sensing radius will directly influence the system time achieved 
by the algorithm. Also, the subregions need not be convex 
as convexity is not required to compute an efficient path 
coverage. Furthermore, the subregions need not be connected. 
We only require that the time required to travel between them 
be negligible relative to the time required to perform path 
coverage sweeps on them. 

Theorem 7: If <p is a piecewise uniform density and T opt is 
the optimal system time for the single-agent Limited Sensing 
DTRP over the class of spatially biased policies, then the 
system time of a single agent operating on Q under the BTS 
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Applying Proposition [3] 



lim E 

r->0+ 



r < 



r 



,. 2njr 2 + Jdiam(Q)r 
lim — h lim 

r-s-0+ V r->0+ D 



2-i. 



2^ 



(a) Example environment and distribution. 



Conditioned upon its location q £ Qj, a target waits one half 
of Kj phases to be serviced, 

E[T BT ske Qj] = ^TP hasc 



1 K 



^pha 



2 ^/JIJ' 

Noting that Pr[q £ Qj] — Hj Aj and unconditioning on q £ 
Qj to find the system time, 



s 43 



(b) Tiling created by BTS policy. 
Fig. 3. Example environment and resultant tiling created by BTS policy. 



policy satisfies 



^BTS 
^opt 



as r 



(9) 



Proof: The total distance traveled between tiles during a 
phase is no more than Jdiam(Q). Assuming that the policy 
is stabilizing, the number of targets serviced during the «-th 
phase rii is finite. The expected length of the i-th phase yP haso 5 
conditioned on n,, satisfies 



E 



^phase. 



< 



(«S J - ife .) + 2rn i + Jdiam(Q) 



Tbts = Pr ^ e Q j] ' E [ T bts|9 G Qj] 

3 = 1 



KT P 



hase J 

2 — EA/Vw- 



Thus, 



lim T B ts»" 

r->0+ 



A" 



, vA*i I lim T phasc r 

X ' ' 1 r->-0+ 



If is a piecewise uniform density and m = 1, the lower 
bound on the optimal system time within the class of spatially 
biased policies in Theorem [2] takes on the form 



lim T opt r > 



Combining, the claim is proved. ■ 
Theorem [7] shows that the optimal spatially biased policy 
for small sensor-range searches a point in the environment at 
regular intervals at a relative frequency proportional to W (p(q), 
as was suggested by Eq. |7]) in the proof of the corresponding 
spatially biased lower bound in Theorem [2] 

We performed numerical experiments of the BTS policy 




for a single agent. As shown in Fig. IV-B (a) the spatial 
distribution of the target-generation process was piece-wise 
uniform with a density of ipi = 1 + lOe in the smaller region 
of area 0.1 and ip2 = 1 + 10e/9 in the larger region of area 
0.9. We varied e from to 0.89. This tested the algorithm's 
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performance under a large range of spatial distributions: from 
uniform (e = 0) to one in which targets appear in the 
smaller region with a 99% probability (e = 0.89). Results 
are shown in comparison with the lower bound (Eq. [6]) in Fig. 



IV-B (b) on a semi-log plot. The BTS policy provides near- 
optimal performance for a large range of spatial distributions, 
i.e., the BTS policy adapts the distribution of the searching 
agent's position in order to exploit the spatially biased target- 
generation process and thereby reduce the expected system 
time overall. In other words, the searching agent provides 
higher quality of service to the targets in the higher density 
regions. Specifically, under the BTS policy, a target's expected 
quality of service will scale (relative to other targets in other 
regions) with the inverse square root of its region's density. 



in 



<).<) 



cp 1 = 1 + 10e 



p 2 = 1 - We/9 



(a) Drawing of the environment and spatial distribution 
(parameterized using e) used for the numerical experi- 
ments on the BTS policy. 




— lower bound 
□ BTS simulation 



(b) Simulation performance of the BTS policy for a single agent in a unit 
square environment with unit velocity and sensing radius r = 0.00625. 
Results are compared with the theoretical lower bound (Eq. |6j for varying 
spatial distribution on a semi-log plot. The parameter e was varied from 
to 0.89, i.e., the spatial distribution varied from uniform to one in which 
99% of the incidents occuring in the subregion with 10% of the area. 



Fig. 4. Numerical experiments of the BTS policy. 



C. The Unbiased Tile TSP (UTTSP) Policy 

This policy requires a tiling of the environment Q with 
the following properties. For some positive integer K £ N, 



partition Q into tiles Si , S2, ■ ■ ■ , Sk such that 



yj<p{<l)dq = ~ / y/(fi(q)dq, 



ft = 1,2 K. 

l Sk A Jq" 

In other words, it requires a convex and equitable partition 
of the environment Q with respect to the measure xjj where 
VK?) — V fil) f° r a U Q- Furthermore, the size and shape 
of each tile must be such that it can be contained in the sensor 
footprint of an agent, i.e., for each Sk there exists a point pk 
such that \\pk — q\\ < r for all q £ Sk- For example, if each 
tile can be bounded by a rectangle, neither of whose side- 
lengths exceeds r/V2, then this property is achieved. A tiling 
with all these properties can be constructed with the following 
simple method. First, partition Q into strip-like tiles of equal 
measure with K\ parallel lines. If K\ is sufficiently large, then 
the width of the thinnest tile is less than or equal to r/y2. 
Next, partition each of the K\ strip-like tiles into K2 tiles of 
equal measure using lines perpendicular to the first set. If K 2 
is sufficiently large, then the height of all tiles is less than or 
equal to rj \/2. The UTTSP policy is defined in Algorithm [3] 
The index i is a label for the current phase of the policy. 



for i «— 1 to 00 do 
for ft <- 1 to K do 

j Execute SNAPSHOT-TSP on tile S k 
end 
end 



Algorithm 3: UTTSP Policy 



Theorem 8: Let T opt be the optimal system time for the 
single-agent Limited Sensing DTRP over the class of spatially 
unbiased policies. Then the system time of a single agent 
operating on Q under the UTTSP policy satisfies 



^UTTSP 



1 as A — >• 



DO. 



(10) 



opt 



Proof: Let us denote Tt B f as the time required to execute 
the tour of the targets in tile Sk computed by SNAPSHOT- 
TSP in the i-th phase. Including the time traveling between 
tiles, we note that 



Ti 



phase 



< 



^diam(Q) 



fc=i 



Applying Proposition [4] the time required for the first tile in 
the (i + l)-th phase T^+i, conditioned on the length of the 
i-th phase T^ hasc , satisfies 



E 



lim 

A— hoc 



rptsp \rp 



■phase 



Ti 



phase 



VV(<?) dq 



Si 



iphase 



Kv 



Since the choice of the first tile, and the epoch of the phase, 
is arbitrary, the expected time to visit all targets in a tile is 
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uniform over all tiles. Summing over all tiles, we get 



E 



lim 

A— > oo 



E 



K ■ lim 

A— ¥QO 



1 l.i+l\ J -i 



,. KdiamtQ) 
lim -= 

A->oo V\J\ 



phase 



From the equation above, it can be verified that 



E 



and 

E 



> T. 



phase 



if rpPhase < *- 



rynphaSC I ry-iphaSC 



phase 



if rpPhase 



Thus, the sequence of phases exhibits a fixed-point, from 
which 

,2 

nphasc 



lim E 

i— >-oo 



yip 



/3 2 A 



Q 



Substituting, as i — !• oo, 



E [t£ 



tsp] 



(3 



Kv 

Ku 2 



phase 



V f(q) dq 



vV(g) dq 



phase 



K 

Note that the above is quantity is independent of the specific 
tile. In expectation, a target waits one half of a phase to enter 
a snapshot. Because of the randomization in starting point 
and direction performed in SNAPSHOT-TSP, in expectation, 
a target waits one half of the time required to visit all targets 
in its snapshot. Thus, 



E [Tuttsp] — -T pha 



2 



K 



>4 



/3 2 A 
2v 2 



\Jf{q) dq 



Since the system time is independent of the location of the 
target, the policy is spatially unbiased. Using the optimal 
system time for the full-information DTRP in heavy load 
within the class of spatially unbiased policies in Eq. ([3]l with 
m = 1 as a lower bound, we see that for large K, the claim 
is proved. ■ 
Although the constraint on spatial bias does not allow the 
policy to service denser regions with higher frequency, Theo- 
rem [8] shows that non-uniformity in the spatial distribution of 
targets, tp, still leads to a lowering of the optimal system time. 
This is due to the efficiency gained by the ETSP tours due to 



non-uniformity, evident in Theorem [T] 

D. The Biased Tile TSP (BTTSP) Policy 

This policy requires that ip be a piecewise uniform density, 
i.e., Qi, Q.%i • ■ ■ , Qj be a partition of Q such that <p{q) = 
Hj Vg e Qj,3 = 1, 2, . . . , J. Let us assume that each subset 
Qj is convex, as a non-convex subset can be further partitioned 
into convex subsets. Let us denote Aj = Area(Qj), j = 
1,2,..., J. 

This policy requires a tiling of the environment Q with 
the following properties. For some positive integer K £ N, 

1 /3 

partition each subset Q, into Kj = Kj '/z ■ convex tiles, each 

1/3 

of area Aj /Kj — Aj fij /K. We assume K is chosen large 
enough that an integer Kj can be found such that K/Kj is 

1 /3 

sufficiently close to fij . Furthermore, the size and shape of 
each tile must be such that it can be contained in the sensor 
footprint of an agent, i.e., for each Sk there exists a point 
Pk such that \\pk — q\\ < r for all q € Sk- The grid-like 
equitable tiling described for the UTTSP policy, applied here, 
would require that each Kj be factorable into possibly large 
numbers Ki and K 2 - This is undesirable because the numbers 
Kj must maintain specific ratios related to the density [ij in 
their domain Qj as described above. However, this scenario 
does have one simpler facet: the density functions within 
each Qj are constant. One example of a method for reaching 
convex and equitable partitions is given in [44 1. Heuristically 
speaking, the proposed algorithms converge to configurations 
in which all cells are approximately hexagonal for constant 
measure tp and large Kj. Hence, for sufficiently large K, 
these hexagonal tiles fit within a circle of radius r. Let us 
give the tiles of Qj an ordered labeling Sj t i,Sj t 2, ■ ■ ■ ,Sj,K - 
The BTTSP policy is defined in Algorithm [4] The index i is 
a label for the current phase of the policy. 



Initialize kj <— 1 for j = 1, 2, . . . , J 
for i <— 1 to oo do 
for j 1 to J do 

Execute SNAPSHOT-TSP on tile S jtkj 
if kj < K 3 then kj <- (kj + 1) 



else fc, ■ <— 1 



end 



end 



Algorithm 4: BTTSP Policy 

Theorem 9: If (p is a piecewise uniform density and T opt is 
the optimal system time for the single-agent Limited Sensing 
DTRP over the class of spatially biased policies, then the 
system time of a single agent operating on Q under the BTTSP 
policy satisfies 



T 



BTTSP 



T, 



1 as A — >• 



X). 



(11) 



opt 



Proof: Assume, for now, that the sequence of phases 
exhibits a fixed-point, from which 



lim E 

i— >oo 



phase 



rj-ipha 
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where T s P hasc denotes the steady-state phase length. Denote 
Tjk- i as tne ti me required to execute the tour of the targets 
in die S jtkj computed by SNAPSHOT-TSP in the i-th phase. 
Any tile in Qj waits Kj phases between snapshots. Applying 
Proposition [4] with the fixed-point assumption, as i —> oo, 



the time required to visit all targets in its snapshot. 



1 



E[Tl p keS,]=-^ 



E 



lim 

A— foo 



ry-itsp 



2v 



A 2 /3 



j J-ss 



phase 



J J-ss 



phase -"-J 



A", 



phase 



A 2 /3 



A -~ J ^ 
Unconditioning on the location of the target, 

J 

E Kap] = E Pr [? £ Gil ' E Kapk G Si] 

3=1 



v V A' 

In other words, the fixed-point assumption implies that 

nphasc 



v , ip bag»»* 2/3 



3 = 1 



lim E 

I— J-OO 



j,kj,i 



j,kj ,ss 



/3 /AT 5 
i; 



A 



2/3 



3=1 



2t> 



A" 



for fcj : = 1, 2, . . . , A'j. Summing over all tiles in a phase, and 
including distance traveled between tiles, 



3=1 



A— foo a/ X ^ — ' A— >oo 
3=1 



rritsp 

j,kj ,ss 



lim 

A— foo 



J • diam(Q) 
vVX 



n / rpphaSC J 



A 



3=1 



The above implies that 



/^-ipha 



/? 2 A 
AV 




E A ? v 5 / 3 



E^f 



3 = 1 



v 3=l 



We now investigate the expected system time of a target, 
conditioned upon its location q G Qj. The time a target waits 
before entering a snapshot, T s ~ ap , is one half of Kj phases, 
i.e., 

E fc p | 9 e Si] = ^AV/r :; ' M 



Namely, for large A', we have 

TbTTSP P 2 



lim 

A— f 00 



A 




= — AT ph 
2 



1S V7 1/3 . 



Unconditioning on the location of the target, 

J 

E t T sna P ] = E Pr [« G Si] ' E [ T -a P k G Si] 



If is a piecewise uniform density and to = 1, then the 
optimal system time for the full-information DTRP in heavy 
load within the class of spatially biased policies in Eq. Q 
takes on the form 

Topt _ P 2 ( 



lim 

A— >oo A 



2v 2 



(12) 



V 3=1 



3=1 
J 



= E(A,^)-QATP s h ->7 1 /3^ 



3=1 
1 



J 



= 2^s P s haSC E A 3^ 
3 = 1 



2/3 



Because of the randomization in starting point and direction 
performed in SNAPSHOT-TSP, in expectation, the time a 
target waits after entering a snapshot, T s "£ ap , is one half of 



Using the above as a lower bound for the Limited Sensor 
DTRP in heavy load, we see that for large A, the claim is 
proved. ■ 
Theorem|9]shows that the optimal spatially biased policy for 
heavy load services targets in the vicinity of a point q at regular 
intervals at a relative frequency proportional to tp(q) 1 ^ 3 . 

V. Multiple Agents with Equitable Regions of 
Dominance 

We have presented four algorithms and proven them optimal 
in four respective cases of the single-vehicle Limited Sensing 
DTRP. We wish to adapt these policies to the multiple-vehicle 
scenario, retaining optimality, with minimal communication 
and collaboration among the agents. Towards this end, con- 
sider the following strategy. Given a single-vehicle policy 
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7T, partition the environment Q into regions of dominance 
V(Q) = (Vi, V 2 , V m ) where U^Vj - Q and V* n V,, = 
if £ h. Each vehicle executes policy ir on its own region 
of dominance. In other words, agent £ is responsible for all 
targets appearing in Vi and ignores all others. 

The control policies to reach convex and equitable partitions 
proposed in fl4l are distributed and can be performed through 
m mobile agents, where each agent only need communicate 
with agents in neighboring cells of the partition. We do not 
pursue this facet of the multi-agent system further, as these 
methods are available and well studied. In the following 
theorems, we show that if the regions of dominance are a 
convex and equitable partition of Q with respect to a designed 
measure ip, then the decentralized strategy achieves an optimal 
system time. However, the appropriate measure ip depends on 
the problem parameters (small sensing radius or heavy load) 
and spatial constraints (biased or unbiased). Moreover, it is a 
function of the spatial distribution of targets <p in three of the 
four cases. 

Some of the single-vehicle policies in the previous section 
require the knowledge of certain properties of the environ- 
ment such as its target generation rate, and its spatial target 
distribution. A single-vehicle policy operating on a subregion 
Vi C Q takes as input the local target-generation process of 
Vi with time intensity Xi — X(p(Vf) and spatial distribution 
ipe : Ve —> R + where <pt(q) = <p(q)/<p{Vt) if q G Vg and 
tpe(q) = otherwise. Note that <pi is normalized such that 
tpt&t) = I- 

Theorem 10: Let T opt be the optimal system time for the 
m-agent Limited Sensing DTRP over the class of spatially 
unbiased policies, and let V(Q) be a convex and equitable 
partition of Q with respect to measure if> where i/j(q) — 1 for 
all q G Q. If each agent £ G I m operates on its own region 
of dominance Ve, under the URS policy, then the system time 
satisfies 



T 



URS/ERD 



T 



1 



(13) 



opt 



Proof: Denoting A{ — Area(V^), we apply Theorem|5]to 
find the system time of a target, conditioned upon its location, 

r i A f 

E[T URS/ERD |geV,j = — . 

Since i/j(Vi) — Ag, the equitable partition implies that Ag = 
A jm for all £ G I m . Hence 

E [TuRS/ERDk e V/] = 

and since the above is independent of the targets location, 
it is in fact the unconditioned system time. Moreover this 
implies that the strategy is spatially unbiased. Combining with 
Theorem |2j the claim is proved. ■ 

Theorem 11: Let ip be a piecewise uniform density, let T opt 
be the optimal system time for the m-agent Limited Sensing 
DTRP over the class of spatially biased policies, and let V(Q) 
be a convex and equitable partition of Q with respect to 
measure ip where ip{q) = y/fio) f° r a U 3 G Q. If each agent 
£ G I m operates on its own region of dominance Vi under the 



BTS policy, then the system time satisfies 



BTS/ERD 



T. 



1 as r-> + . 



(14) 



opt 



Proof: We apply Theorem [7] to find the system time of ; 
target, conditioned upon its location, 

E [Tbts/erdI? e Vi] = ^ (^J y/<pe(q) dq^j . 
Substituting <p t (q) = ip(q)/<p(V e ) we get 

E pBTS/ERD I Q e V e ] = yfM dq 

Unconditioning on the location of the target, 

m 

^BTS/ERD — P r [Q G V«] • E [TbtS/ERdI*? € Vi] 

1 1 



e=i 



4vrtp(Vi) \J Ve 



vM?) d i 



But the equitable partition with respect to ip(q) = \J ip(q) 
implies that 



V<P(<l) dq = — [ Vvil) 



and so 



Vi 



T 



dq, V£ G I„ 



1 



BTS/ERD 



Amvr 



Combining with Theorem [2] the claim is proved. ■ 

Theorem 12: Let T opt be the optimal system time for the 
m-agent Limited Sensing DTRP over the class of spatially 
unbiased policies, and let V(Q) be a convex and equitable 
partition of Q with respect to measure ip where ip(q) — \j tp(q) 
for all q G Q. If each agent £ G I m operates on its own region 
of dominance Vi under the UTTSP policy, then the system 
time satisfies 



T 



UTTSP/ERD 



T, 



1 as A — > oo. 



(15) 



opt 



Proof: We apply Theorem [8] to find the system time of ; 
target, conditioned upon its location, 

E [7uttsp/erdI<7 eV^] = 7^ X <> ^J Vw(i) d, ^j ■ 
Substituting ipi(q) = ip(q) / ip(Vg) and X e = Xip(Vg) we get 

E [TuttSP /erd I Q € V e ] = yj \l<~p{q) dq 



But the equitable partition with respect to ip(q) = \/ <p(q) 
implies that 



' y/<p{q) dq = — [ V<p(q) 
Vt rnJ Q 



dq, V£ G Im, 
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and so 



E [Tuttsp /erd I q G 



/3 2 A 
2m 2 v* 



VVOz) dq 



Since the above is independent of the targets location, it is in 
fact the unconditioned system time. Moreover this implies that 
the strategy is spatially unbiased. Combining with Eq. the 
claim is proved. ■ 
Theorem 13: Let <p be a piecewise uniform density, let T opt 
be the optimal system time for the m-agent Limited Sensing 
DTRP over the class of spatially biased policies, and let V{Q) 
be a convex and equitable partition of Q with respect to 
measure ip where ip(q) = </?(q) 2 / 3 for all q G Q. If each 
agent t 6 I m operates on its own region of dominance Vt 
under the BTTSP policy, then the system time satisfies 



T 



BTTSP/ERD 



T, 



1 as A — s- 



(16) 



opt 



Proof: We apply Theorem [9] to find the system time of a 
target, conditioned upon its location, 



E[T, 



BTTSP/ERD 



Substituting (fi(q) = tp(q)/tp(Vi) and = X<p(V£) we get 



E[Ti 



BTTSP/ERD 



\q G V/] 



/3 2 A 1 



2 W 2 v (Vj) Uv, 
Unconditioning on the location of the target, 



^BTTSP/ERD — J^P r [Q G Vf] • E [T BTT sp/ERDk G V«] 



E 



^ ^ 1 



<^(g) 2/3 d« 



2v 2 ^ 

i=i 



f(q) V3 dq 



But the equitable partition with respect to ip(q) = <p(q) 2 / 3 
implies that 

1 



<p(q) V3 dq - 



v f 



<p{q) 2/3 dq, Wel m , 



and so 



T 



/3 2 A 



BTTSP/ERD — ~ — 9-5- 

2m z v 



<p(q) 2/3 dq 



Combining with Eq. Q, the claim is proved. ■ 
In summary, we have offered four policies, each of which 
performs optimally for the four cases studied: small sensing- 
range (spatially biased and unbiased) and heavy load (spatially 
biased and unbiased). In addition, we have offered a method 
by which to adapt the four policies to a multi-vehicle setting, 
retaining optimality, with minimal communication or collab- 
oration. In particular, the agents partition the environment 
into regions of dominance, and each vehicle executes the 
single-vehicle policy on its own region, ignoring all others. 
However, the nature of the partition varies in the different 
cases addressed. Each scenario requires regions of dominance 



TABLE I 

Optimal system time for the DTRP with limited sensing. 





spatially unbiased 


spatially biased 


r ->• 0+ 


A 
Amvr 


irLM^dq) 2 


A — > 00 


L?,* (I v<p d< i) 2 





equitable with respect to a measure appearing in the optimal 
system time of the corresponding single-vehicle case. In the 
spatially unbiased small sensing-range case, the regions of 
dominance are equitable with respect to area. Interestingly, 
the spatially biased small sensing-range case, and the spatially 
unbiased heavy load case both require regions of dominance 
equitable with respect to measure tp where tp(q) = \J <p(q) for 
all q € Q. Finally, the spatially biased heavy load case requires 
regions of dominance equitable with respect to measure ip 
where ip(q) — (p(q) 2 ^ 3 for all q e Q. 

We have shown that in heavy load, the limited information 
gathering capabilities of the agents have no effect on their 
achievable performance. We suggest the intuitive explanation 
that in heavy load, the environment is dense with targets, 
and so the searching component added to the DTRP is non- 
existent. On the other hand, for small sensing range, the lack of 
information gathering capability detracts from the achievable 
performance of the system. However, the presented policies 
for these cases were still proved optimal through new lower 
bounds related to the searching capability of the agents and 
the necessary structure of any stabilizing policy. 

VI. Conclusions 

We have addressed a multi-agent problem with information 
constraints we call the DTRP with limited sensing. Our 
analysis yields precise characterizations of the system time, 
and the parameters describing the capabilities and limitations 
of the agents and the nature of the environment appear (or 
don't appear) in these expressions, giving insight into how 
the parameters affect (or don't affect) the efficiency of the 
system. We summarize the characterizations of the optimal 
system times for the four cases studied in Table [Vlj 

In terms of methods and approaches, we note the path taken 
in the small-sensor case. In order to place lower bounds on 
the achievable performance of any algorithm, we carefully 
relax constraints to arrive at a convex optimization problem 
whose solution offers insight into the structure of the optimal 
algorithm. Using this structure as guidance, we design a 
provably optimal algorithm. Moreover, we have made use of 
results from the mature fields of combinatorial optimization 
and probability theory. 

Another interesting result is that the limited sensing capa- 
bility has no impact on performance when the target arrival- 
rate is high. In other words, this lack of global information 
does not hinder an agent from efficient routing choices. This 
result assumes knowledge of prior statistics on the global 
environment. Perhaps through proper mechanism design, a 
game-theoretic approach [ 18 1, [21] might integrate the learning 
of the global structure of the environment and the adaptation 
of policy choices. Consider a team of agents with limited 
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sensing and no communication [45| operating in a common 
environment. From the perspective of an individual agent, 
there might not be any observed difference between i) a region 
with low target-arrival rate, and ii) a region with high target- 
arrival rate that is frequently serviced by other agents. Does 
this difference matter? In either case, the agent should search 
for target-rich regions in order to maximize both its own utility 
(target-servicing rate) and the value it is adding to the multi- 
agent system. 
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